{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ec4ea046-ac11-4cb8-9531-b6a90fea6ecb",
   "metadata": {},
   "source": [
    "# L3: Projections\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63e0c89c-3280-4cda-9235-94f1a126936c",
   "metadata": {},
   "source": [
    "<p style=\"background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px\"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44c86444-65a5-40e4-b65a-f86fb84662c4",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "# Warning control\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1269d100-d775-44d0-9198-25a7194234f2",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "import custom_utils "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e955321-63a1-49f8-8c0b-50a590662c47",
   "metadata": {},
   "source": [
    "<p style=\"background-color:#fff6ff; padding:15px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px\"> 💻 &nbsp; <b>Access <code>requirements.txt</code> and <code>utils</code> files:</b> To access <code>requirements.txt</code> for this notebook, 1) click on the <em>\"File\"</em> option on the top menu of the notebook and then 2) click on <em>\"Open\"</em>. For more help, please see the <em>\"Appendix - Tips and Help\"</em> Lesson.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad100d3a-2878-49d8-9149-7cf17b762ae3",
   "metadata": {},
   "source": [
    "## Data Loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ba54c1f-108c-49f2-bcdb-234c06df3eaa",
   "metadata": {
    "height": 166
   },
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "import pandas as pd\n",
    "\n",
    "dataset = load_dataset(\"MongoDB/airbnb_embeddings\", streaming=True, split=\"train\")\n",
    "dataset = dataset.take(100)\n",
    "# Convert the dataset to a pandas dataframe\n",
    "dataset_df = pd.DataFrame(dataset)\n",
    "dataset_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cef8a3ce-e87d-49f3-a935-530cb54d6f2e",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "print(\"Columns:\", dataset_df.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75cabb79-54bd-443f-8edc-ba80ec529be5",
   "metadata": {},
   "source": [
    "## Document Modelling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03e640da-4906-4ade-8f0e-9559f7239575",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "listings = custom_utils.process_records(dataset_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d8dc7e1-d108-4a40-8294-4325b69fd5a4",
   "metadata": {},
   "source": [
    "## Database Creation and Connection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d124791c-f015-440b-9d64-6cb7d6c7a6c9",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "db, collection = custom_utils.connect_to_database()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5865a1f-9db7-4eb4-b768-e3353d637121",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Delete any existing records in the collection\n",
    "collection.delete_many({})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d616ad62-e4b2-4086-b596-14320baa5db6",
   "metadata": {},
   "source": [
    "## Data Ingestion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b602c4d7-fd03-4fa4-be63-0b22b29724b7",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "collection.insert_many(listings)\n",
    "print(\"Data ingestion into MongoDB completed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9567637d-9c54-4594-81b7-592f634eb7fa",
   "metadata": {},
   "source": [
    "## Vector Search Index defintion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "061b04e8-b279-41dc-80fc-d94fcd544af8",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "# Create vector search index\n",
    "custom_utils.setup_vector_search_index_with_filter(collection=collection)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e995ebbe-3c48-4fb4-a41e-e190b5e72961",
   "metadata": {},
   "source": [
    "<p style=\"background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px\"> ⏳ <b>Note:</b> If the output of the previous cell is <code>Error creating vector search index: Duplicate Index</code> you may proceed to the next cell if you intend to still use a previously created index.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "678beaa8-f9bb-4f60-bf21-1e56916c6da9",
   "metadata": {},
   "source": [
    "## Handling User Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea395473-e021-487e-a794-280655380fb2",
   "metadata": {
    "height": 251
   },
   "outputs": [],
   "source": [
    "from pydantic import BaseModel\n",
    "from typing import Optional\n",
    "\n",
    "# Note: Ensure that the  projection document in the projection stage matches the search result model.\n",
    "class SearchResultItem(BaseModel):\n",
    "    name: str\n",
    "    accommodates: Optional[int] = None\n",
    "    address: custom_utils.Address\n",
    "    summary: Optional[str] = None\n",
    "    space: Optional[str] = None\n",
    "    neighborhood_overview: Optional[str] = None\n",
    "    notes: Optional[str] = None\n",
    "    score: Optional[float]=None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1cdf3da4-0686-469b-af28-28c7e2540f2f",
   "metadata": {
    "height": 625
   },
   "outputs": [],
   "source": [
    "from IPython.display import display, HTML\n",
    "\n",
    "def handle_user_query(query, db, collection, stages=[], vector_index=\"vector_index_text\"):\n",
    "    get_knowledge = custom_utils.vector_search_with_filter(query, db, collection, stages, vector_index)\n",
    "\n",
    "    if not get_knowledge:\n",
    "        return \"No results found.\", \"No source information available.\"\n",
    "    \n",
    "    print(\"List of all fields of the first document, before model conformance\")\n",
    "    print(get_knowledge[0].keys())\n",
    "\n",
    "    search_results_models = [\n",
    "        SearchResultItem(**result)\n",
    "        for result in get_knowledge\n",
    "    ]\n",
    "\n",
    "    search_results_df = pd.DataFrame([item.dict() for item in search_results_models])\n",
    "\n",
    "    completion = custom_utils.openai.chat.completions.create(\n",
    "        model=\"gpt-3.5-turbo\",\n",
    "        messages=[\n",
    "            {\n",
    "                \"role\": \"system\", \n",
    "                \"content\": \"You are a airbnb listing recommendation system.\"},\n",
    "            {\n",
    "                \"role\": \"user\", \n",
    "                \"content\": f\"Answer this user query: {query} with the following context:\\n{search_results_df}\"\n",
    "            }\n",
    "        ]\n",
    "    )\n",
    "    system_response = completion.choices[0].message.content\n",
    "    print(f\"- User Question:\\n{query}\\n\")\n",
    "    print(f\"- System Response:\\n{system_response}\\n\")\n",
    "    display(HTML(search_results_df.to_html()))\n",
    "    return system_response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cc119c1-b0e5-4547-8972-a1099a1bf881",
   "metadata": {},
   "source": [
    "## Adding A Projection Stage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98b7529b-e09b-4f4f-b746-2450f0a939b7",
   "metadata": {
    "height": 404
   },
   "outputs": [],
   "source": [
    "\n",
    "projection_stage = {\n",
    "    \"$project\": {\n",
    "        \"_id\": 0,  \n",
    "        \"name\": 1,\n",
    "        \"accommodates\": 1,\n",
    "        \"address.street\": 1,\n",
    "        \"address.government_area\": 1, \n",
    "        \"address.market\": 1,\n",
    "        \"address.country\": 1, \n",
    "        \"address.country_code\": 1, \n",
    "        \"address.location.type\": 1, \n",
    "        \"address.location.coordinates\": 1,  \n",
    "        \"address.location.is_location_exact\": 1,\n",
    "        \"summary\": 1,\n",
    "        \"space\": 1,  \n",
    "        \"neighborhood_overview\": 1, \n",
    "        \"notes\": 1, \n",
    "        \"score\": {\"$meta\": \"vectorSearchScore\"} \n",
    "    }\n",
    "}\n",
    "\n",
    "additional_stages = [projection_stage]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17fe5995-2230-4b68-a22c-949537ff2056",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c36d227f-4bd2-4109-9df5-886405dccdac",
   "metadata": {
    "height": 234
   },
   "outputs": [],
   "source": [
    "query = \"\"\"\n",
    "I want to stay in a place that's warm and friendly, \n",
    "and not too far from resturants, can you recommend a place? \n",
    "Include a reason as to why you've chosen your selection\"\n",
    "\"\"\"\n",
    "handle_user_query(\n",
    "    query, \n",
    "    db, \n",
    "    collection, \n",
    "    additional_stages, \n",
    "    vector_index=\"vector_index_with_filter\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fdc02bc-b92b-4d2f-88f5-214c69691a0b",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d86e9f0-5a7e-43fb-b855-ecf3ee2a46a5",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ecb2cff-50de-4656-bc62-1586e9a06b9a",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6d230f8-324c-4466-8efc-e02e2ea3ff30",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0318142e-32a2-4bb8-90ec-c1d059d93b95",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9dd5c80-b97a-4595-bcfa-83444684af61",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
